Experiments in Data Format Interoperation Using Defuddle

نویسندگان

  • Robert E. McGrath
  • Jason Kastner
  • Jim Myers
چکیده

This document discusses the status of the Defuddle parser and recent work conducted as part of the “Innovative Systems and Software: Applications to NARA Research Problems” project. Robust sharing, reuse, and curation of data requires a clean separation of issues related to bits, formats, and logical content. To address these issues the Open Grid Forum is defining the Data Format Description Language (DFDL) standard for describing the structure of binary and textual files and data streams so that their format, structure, and metadata can be exposed as XML [3]. While this is sufficient for describing the internal layout of data (the “syntax”), interoperability and curation also require description of logical relationships within and between data sets in terms of globally understood concepts (the “semantics”). Extending the concept of the DFDL, and the Defuddle DFDL parser implementation, we have defined a two-step declarative mechanism for describing the structure and relations in binary and ASCII data in terms of vocabularies defined using standard Semantic Web languages (RDF and OWL). We briefly outline our approach below and highlight the potential benefits of exposing internal data semantics in the context of larger semantic systems. DFDL and Defuddle separate the issues: • The DFDL annotate schema defines the logical structure of the data • The DFDL annotations map the bits to the logical structure • The XML output of Defuddle provides a standard format with a well-defined logical structure • The RDF output of Defuddle provides description of logical relationships in a standard format This document describes two demonstrations of data interoperation with Defuddle. These demonstrations explore how Defuddle and DFDL can be used for tasks that usually are implemented with code. The first demonstration explores an aspect of file characterization, namely recognition of the “MIME-type” of a file. The second demonstration illustrates a simple example of recognition and 3D data format interoperation. In both these cases, the Defuddle parser software was used without modification. Each application required development of appropriate DFDL-annotated XML schemas to read the binary or text data. Also, the MIME-type recognizer used the semantic extensions to produce an RDF triple that asserts the MIME-type. The rest of the paper is organized as follows. Section 2 gives some background on the Defuddle technology. Section 3 explains the file identification demonstration. Section 4 discusses the 3D formats demonstration. Section 5 concludes. Appendices give details of the XML schemas and XSL transformst used. Data Interoperability Experiments

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a Semantic Preservation System

This document discusses the status of the Defuddle parser and recent work conducted as part of the “Innovative Systems and Software: Applications to NARA Research Problems” project. Preserving access to file content requires preserving not just bits but also meaningful logical structures. The ongoing development of the Data Format Description Language (DFDL) is a completely general standard tha...

متن کامل

Mapping Physical Formats to Logical Models to Extract Data and Metadata: The Defuddle Parsing Engine

Scientists, motivated by the desire for systems-level understanding of phenomena, increasingly need to share their results across multiple disciplines. Accomplishing this requires data to be annotated, contextualized, and readily searchable and translated into other formats. While these requirements can be addressed by custom programming or obviated by community standardization, neither approac...

متن کامل

Semantic Extensions to Defuddle: Inserting GRDDL into XML

The product of this task is to add GRDDL as a post-processing stage to Defuddle. GRDDL fires zero or more XSL’s which are supposed to create RDF. Thus, this is a mechanism to specify how to extract RDF metadata from the input files. (GRDDL operates on the XML generated by Defuddle, so, technically, the extractors will only work on data that is in the DFDL generated XML. Data in the original tha...

متن کامل

An Open Repository Model for Acquiring Knowledge About Scientific Experiments

The availability of high-quality metadata is key to facilitating discovery in the large variety of scientific datasets that are increasingly becoming publicly available. However, despite the recent focus on metadata, the diversity of metadata representation formats and the poor support for semantic markup typically result in metadata that are of poor quality. There is a pressing need for a meta...

متن کامل

A New Implementation for Ontology Mapping Based enterprise Semantic Interoperation

In the interoperation among enterprises in eBusiness, there is a big problem that different data models and information description are used in different enterprises’ systems which blocking the ambient Semantic collaboration. Ontology is an important tool to overcome syntax and semantic misunderstanding. Our goal is to provide a user-friendly environment supporting syntax and neutral format dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009